Goto

Collaborating Authors

 mm 1




Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis

Yang, Hongru, Kailkhura, Bhavya, Wang, Zhangyang, Liang, Yingbin

arXiv.org Artificial Intelligence

Understanding the training dynamics of transformers is important to explain the impressive capabilities behind large language models. In this work, we study the dynamics of training a shallow transformer on a task of recognizing co-occurrence of two designated words. In the literature of studying training dynamics of transformers, several simplifications are commonly adopted such as weight reparameterization, attention linearization, special initialization, and lazy regime. In contrast, we analyze the gradient flow dynamics of simultaneously training three attention matrices and a linear MLP layer from random initialization, and provide a framework of analyzing such dynamics via a coupled dynamical system. We establish near minimum loss and characterize the attention model after training. We discover that gradient flow serves as an inherent mechanism that naturally divide the training process into two phases. In Phase 1, the linear MLP quickly aligns with the two target signals for correct classification, whereas the softmax attention remains almost unchanged. In Phase 2, the attention matrices and the MLP evolve jointly to enlarge the classification margin and reduce the loss to a near minimum value. Technically, we prove a novel property of the gradient flow, termed \textit{automatic balancing of gradients}, which enables the loss values of different samples to decrease almost at the same rate and further facilitates the proof of near minimum training loss. We also conduct experiments to verify our theoretical results.


3D Programming of Patterned Heterogeneous Interface for 4D Smart Robotics

Song, Kewei, Xiong, Chunfeng, Zhang, Ze, Wu, Kunlin, Wan, Weiyang, Wang, Yifan, Umezu, Shinjiro, Sato, Hirotaka

arXiv.org Artificial Intelligence

Shape memory structures are playing an important role in many cutting-edge intelligent fields. However, the existing technologies can only realize 4D printing of a single polymer or metal, which limits practical applications. Here, we report a construction strategy for TSMP/M heterointerface, which uses Pd2+-containing shape memory polymer (AP-SMR) to induce electroless plating reaction and relies on molecular dynamics, which has both shape memory properties and metal activity and information processing power. Through multi-material DLP 3D printing technology, the interface can be 3D selectively programmed on functional substrate parts of arbitrary shapes to become 4D electronic smart devices (Robotics). Microscopically, this type of interface appears as a composite structure with a nanometer-micrometer interface height, which is composed of a pure substrate layer (smart materials), an intermediate layer (a composite structure in which metal particles are embedded in a polymer cross-linked network) and a pure metal layer. The structure programmed by TSMP/M heterointerface exhibits both SMA characteristics and metal properties, thus having more intelligent functions (electroactive, electrothermal deformation, electronically controlled denaturation) and higher performance (selectivity of shape memory structures can be realized control, remote control, inline control and low voltage control). This is expected to provide a more flexible manufacturing process as platform technology for designing, manufacturing and applying smart devices with new concepts, and promote the development of cutting-edge industries such as smart robots and smart electronics.